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ABSTFACT 

This  paper  atxeT.pts  to  provide  the  user  of  linear  multiple  regression 
with  a  batter;/  of   iiagncstic  tools  to  determine  which,  if  any,  data  points 
have  high  leverage  or  irifluerice  on  the  estimation  process  and  how  these 
pcssidly  iiscrepar.t  iata  points  -differ  from  the  patterns  set  by  the  majority 
of  the  data.  Tr.e   point  of  viev;  taken  is  that  when  diagnostics  indicate  the 
presence  of  anomolous  data,  the  choice  is  open  as  to  whether  these  data  are 
in  fact  -unus-ual  and  helprul,  or  possioly  hannful  and  thus  in  need   of  modifica- 
tions or  deletion. 

The  methodology/  developed  depends  on  differences,  derivatives,  and 
decompositions  of  basic  re:gressicn  statistics.  Th.ere  is  also  a  discussion  of 
hov;  these  tecr-niques  can  be  used  with  robust  and  ridge  estimators,  r^i   exarripls 
is  given  showing  the  use  of  diagnostic  methods  in  the  estimation  of  a  cross - 
cou.ntr>'  savir.gs  rate  model. 
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1.   -'  .TP.ODUCT-io'.  < 

1.1  General  r-^als 

Econcrriists  and  other  "ocel  cuilders  rave  responded  willingly  to  rr3.:cr 
opporf unities  that  ha%'9  appeared  in  the  past  two  decades  -  a  rapidly  grciving 
datand  for  policy  T^jidar.ce  and  fGrec-'":"t~  frcn  ^cvernTient  nnd  busines",  and 
the  p'urely  intellectual  goal  of  ad^'.arcinj^  the  state  of  ]<ncv;ledge  through 
model  develcprr.ent .  Tr.e   funda-nental  enablLng  condition  has  been  the  abilify 
to  produce  mere  intricate  .T.odel3  at  decreasing  'unit  cost  because  of  advances 
in  ccnputer  technolog".'.  A  large  econcnetric  ncdel  f.-;ent^/  years  ago  had 
twenfy  equations:  tcday  a  large  model  has  a  thousand  equations.  It  is  not 
only  larger'  .models,  but  also  larger  data  sets  and  more  sophisticated 
f'unctional  fcrr.s  and  estimators  that  have  burgeoned. 

The  transition  from  slider^ule  and  desk  calculator  to  the  large  scale 
digital  computer  has  happened  v;ith  startling  speed.  Th.e  benefits  ':3:^e,   in 
our  opinion,  been  notable  and  at  timie:-'  exciting:  we  ]<now  a  great  deal  more 
about  the  economy  and  can  provide  more  intelligent  g'aidance  as  a  direct 
result  of  increased  com.putaticnal  pc-.-'or.  At  the  same  trne,  there  are 
hidden  costs  of  c-urrent  approaches  to  quantitative  economic  research  via 
computer  which  ought  to  be  recognized . 

One  major  cost  is  that,  today,  the  researcher  is  a  great  deal  further 
away  from,  data  than  he  v;as ,  perforce ,  in  the  heyday  of  the  desk  calculator . 
If  there  ere   a  great  many  ecuations  to  estimate  or  thousands  of  observations 
for  a  few  equations,  there  is  a  narjr;-'l  tendency  to  use  the  computer  for  what 
it  does  well:  process  data,  A  tape  arrives  and  after  a  frustrating  day  or 
two  is  accessible  by  a  ccmtuter  oroer-.m.  (often  a  regression  package,  plain  or 


fancy).  Then  estiriaticn  and  h^yporhe-, is  testing  gex  underv;ay  until  some 
satis  factor"'/  conclusion  is  obtained.  It  is  not  T.isgijided  nostalgia  to 
point  out  that  it  was  rore  li.kelv' ,  v/ith  the  more  labor  intensive  tecrmology 
of  the  past,  for  the  researcher  to  uncover  peculiarities  in  the  data. 
Nor  do  we  counsel  a  ret^am  to  the  golden  past.  V/hat  concerns 
us  is  that  the  "scrr.ething"  which  has  heen  lost  in  modern  practice  is 
valuable  and  is  net  recoverable  fr^m  "tandard  regression  sratistics. 
Our  first  major  objective  is  to  suggest  procedures  that  exploit  computer 
brawn  in  new  ways  that  will  permit  us  to  get  closer  to  the  character 
of  the  data  and  its  relation  to  h^'poth.esized  and  estimated  models. 

There  is  the  related  issue  of  reliability.  Our  ability/  to  crunch 
large  quantities  of  numbers  at  lov;  ccf.t  maizes  it  feasible  ro  iterate 
many  -ci-mes  with  a  given  body  of  data  'ntil  the  estimated  model  meets 
widely  accepted  performance  criteria  "n  terms  of  statistical  measures 
such  as  t  statistics,  Durbin-Watson  .^ratistics  and  m.ultiple  correlations, 
along  with  theoretically  approved  coefficient  signs  and  magnitudes. 
The  iterative  process  is  not  what  tht  statistical  theor^'  er>ployed  was 
originally  all  about,  so  that  it  behooves  us  to  consider  alternative 
ways  of  assessing  reliability,  'which  is  a  second  major  objective  of  this 
paper. 

.'Another  aspect  of  reliability  is  associated  v/ith  questions  of  distance 
from,  the  data  that  were  mentioned  at  the  outset.  Specifically,  the 
closer  one  is  to  the  data,  the  more  l:>ely  it  is  that  oddities  in  the 
data  will  be  uncovered  or  fail'ure  of   ~he  model  and  data  to  conform  with 
each  ether  will  be  discernible,  so  thrt  reliabilify  can  be  Lncreased 


ti'e,  xhis  posses  a  dile.iTra,  since  the-  researcher  rray  then  be  excessively 
prone  to  devise  theories  from  data.  This  -enptation,  of tan  referred  to  as 
data  mLning,  should  be  restrained.  Or.e  sort  of  insurance  against  data 
mining  is  to  be  a  strict  3aysian  and  thus  be  guided  by  sensible  rioles  for 
CCTiibining  prior  and  posterior  inf orrrat  ion .  Alternatively  the  ntxiel 
should  be  tested  -  repeatedly  if  possible  -  on  bodies  of  data  unavailable 
at  the  time.  3eir.g  a  strict  Eaysiari  is  not  always  practical  nor  is  it 
deemed  to  be  universally  desirable.  As  a  general  r-ile  then,  the  most 
practical  safeguard  lies  with  replication  using  previously  unavailable  data. 

1.2  Regression  Diagnostics  arid   Model  Input  Pertrur  bat  ions 

This  paper  preser-s  a  different  approach  to  the  aralysis  of  linear 
re^i^ression.  V/hile  we  will  sometimes  use  classical  procedures,  the 
principal  novelty  is  greater  emphasis  on  new  diagnostic  techniques. 
These  proced-jres  som.etimes  lack  rigorous  theoretical  support , 
but  possess  a  decided  advantage  in  that  they  will  ser^/e  as  yet  ■onmet 
needs  of  applied  research.  A  significant  aspect  of  our  approach  is 
the  development  of  a  comprehensive  set  of  diagnostics. 

An  important  underlying  concept  is  that  of  perrarbLng  regression  model 
inputs  and  examining  the  model  output  response.  We  view  m.odel  inputs  broadly 
to  include  data,  param.eters  (to  be  estimated),  error  rrodels  and  estimation 
assumptions ,  f -urotional  f cm  and  a  data  ordering  in  time  or  space  or 
over  other  characteristics.  Outputs  include  fitted  values  of  the 

dependent  \/ariable ,  estimated  paramater  values,  .^siduals  and  functions 

■  2 
of  these  (R  ,  standard  errors,  autocorrelations,  etc.). 

We  plan  to  cevelco  various  t;/pes  of  ir.put  perturbations  tbat  will  reveal 

where  rrodel  outputs  are  'unusually  sensitive.  Parruroations  can  ":a<e  tha 


form  of  differ'entiation  or  differercing,  deletion  (of  data),  or  a 
ch»an?e  in  estirriaticn  or  error  model  arsijnptions. 

The  first  approach  to  pert^jrlration  is  "differentiation"  (in  a 
broad  sense)  of  output  processes  with  respect  to  input  processes,  in 
order  to  find  a  rate  of  change.  This  will  provide  a  first  order  rr.easure 
of  hov/  outtut  is  influenced  by  incut ;  differences  would  be  substituted 
for  derivatives  in  discrete  cases.  I"  the  rate  of  charge  is  large,  it 
can  be  a  sign  of  potential  trouble.  'Generally,  one  would  like  to  have 
srall  in.Dut  pertur  bat  ions  lead  to  small  output  deformations.  We  would 
also  use  this  idea  to  see  how  big  a  p-^rturbation  can  be  before  everythir.g 
breaks  dovm.  Of  course,  a  "good"  model  is  generally  responsive  to  anticipated 
changes  in  input. 

For  example,  one  could  "differer:':iate"  the  model  with  respect  to  its 
oarameters  to  ascertain  output  sensitivity  to  small  changes  in  the 
parameters.   (We  could,  for  example,  -valuate  this  param.eter  sensitivity 
runction  at  the  estimated  parameter  viilues . )  This  might  indicate  scmie 
of  the  more  critical  parameters  in  th-^  model  that  deser'/e  further 
analysis . 

A  second  procedure  is  to  perturb  the  input  data  by  deleting  or 
altering  one  data  point  and  obser^/e  changes  in  the  outputs .  !-'ore  generally 
we  car.  remove  random  groups  of  data  rcints  or,  for  time  series,  secuences 
of  data  points .  This  is  one  way  to  search  for  param.eter  instability/ 
over  time.  3y  deleting  individual  da■^a  points  or  collections  of  points 
one  can  obser^/e  whether  or  not  subset-  of  the  data  exert  unusual  influence 
on  the  outputs.  In  particular,  it  is  possible  to  establish  if  a  minority 
of  rhe  iaza  beha'/e  differentl'/  frc.T  ~:e   ra^rrit."  of  the  claza.     Th.e  concr-c~ 


of   iiscrepaTit  behavior  by  a  rlnori^;-  :f  the  data  is  basic  to  the  diagnostic 
view  elaborated  in  this  paper. 

The  third  atproac::  will  be  to  ex.uTine  output  sensitivify  to  changes 
in  the  error  r.cdel.  Instead  of  usinr  least  squares,  estirators  such  as 
least  absolute  residuals  '.vould  be  art"  ied  which  impute  less  influence 
to  lar?e  residuals.  A  ~cre  pronu.sin"  altejrnati'.'e  for  diagr.ostic  p'^poses 
is  the  Huber  t",t;e  error  r.odel  Lll.  '.  "r'/in^  a  oarajr.eter  in  the  Huber  model 
ri>3vides  a  v.'av  to  exairdne  sensitivitv  to  charges  in  the  error  asSur;ptions . 
Thiis  area  is  related  to  recent  resear^.h  in  robust  statistics  [17  ]. 

Another  aspect  of  changed  error  issumpticns  is  specific  to  tine 
series.  Practicing  eccncr.etricians  are  well  aware  that  par.5jT:eter'  estiirates 
creins,e  uhen   the  sairple  period  is  alter^ed .  'Awhile  this  might  orJIy 
reflect  expected  sam^plLng  f luct'uationr ,  the  pcssibilit^y  exists  that  the 
population  param.eters  are   tri-ly  variable  and  should  be  m.odeled  as  a  rar.don 
process.  It  is  also  possible  that  th-j  population  param.eters  are  stable 
but  mispecification  causes  sam.ple  estimates  to  behave  as  if  they  were  a 
random,  process.  In  either  case  expli::it  estimation  methods  for  randomt 
param.eters  based  on  the  Kal-r'an  filter  might  reveal  param.eter  instabilify 
of  interest  from  a  diagnostic  point  cf  view. 

V.'hj.le  classical  statistical  methods  in  most  social  science  contexts  treat 
the  sajnple  as  a  given  and  then  derive  tests  about  model  adequacy,  we  tai-ce  the 
more  eclectric  position  that  diagnostics  might  reveal  weaknesses  in  the  data, 
the- model  or  both.  Several  diagnostic  procedures,  for  example,  are  designed  to 
reveal  'unusual  ro>:s  or  outliers  in  th>-'  data  matrix  which  by  assiimption 
has  no  formal  distribution  properties .  If  a  suspect  data  row  has  been 


to  introduce  a  dummy  variable,  especially  when  subsequent  examination 
reveals  that  an  "'unusual"  situation  ccnald  nave  ^enei'^ated  that  data  row. 
Alterratively  the  nodel  may  be  respecified  in  a  more  complex  way.  Of 
course  the  suspicious  row  might  simpl'-  be  deleted  or  modified  if  found 
to  be  in  error.  In  summary,  the  diarnostic  approach  leaves  open  the 
question  of  whether  the  model,  the  data  or  both  should  be  modified. 
In  some  instances  described  later  on,  one  might  discover  a  discrepant 
rov;  and  decide  to  retain  it,  while  at  the  sam.e  time  having  acquired 
a  more  complete  understanding  of  the  statistical  estimates  relative 
to  the  data. 

1 . 3  Modelip-g  Research  Ains  and  Diagnostics 

We  reiterate  here  several  principal  objectives  that  diagnostics  can 
serve,  from  the  modeler's  perspective,  in  obtaining  a  clearer  'understanding 
of  regression  beyond  those  obtainable  from  standard  procedures.  Some  of 
these  are  of  recent  origin  or  are  relatively  neglected  and  ought  to  be 
msore  heavily  emphasized.  The  tbree  main  modeling  goals  are  detection 
of  disparate  data  segm.ents,  collinearity,  and  temporally  ijnstable  regression 
parameters .  It  will  becom.e  clear  as  this  paper  proceeds  that  overlaps 
exist  among  detection  procedures. 

1.3.1  Leverage  and  Disoarate  Data 


The  first  goal  is  the  detection  of  data  points  that  have  disproportionat 
weight,  either  because  error  distributions  ar-e  poorly  behaved  or  because 
the  explanatory/  variables  have  Crruitivariate )  outliers.  In  either  case 
regression  statistics,  coefficients  in  particular,  may  be  heavily  dependent 
on  si^sets  of  the  iata.  ("-".is  iraf~  i?  rrir-.cioallv  concerner.  with  tbese 


aspects  of  diagnosis:  the  other  topics  are  of  equal  iTircrtar.ee.  At  this 
stage  of  o\jir   research  we  are  ccrrJrig  zo   a  better  •andei^scar.dirjg  of  xhe 
scope  of  regression  diagriostics  and  v:e  shall  rely  heavily  on  the  work 
of  others  in  describLag  these  orher  r.ethods . ) 

1.3.2  Ccllinearitv 

'.•.Tiile  exact  ILnear  deoendencies  are  rare  arong  explanatory  v'ariables 
apart  froin  incorrect  problem  fomularion,  the  occurance  of  near  dependencies 
arises  (all  too)  frequently  in  practice ,  VJhile  some  collinearity  can  be 
moderated  by  appropriate  rescaling,  in  rnany  instances  ill-conditionirig 
remains.  There  are  t'^'o  separate  issues,  diagnosis  and  treatr.ent.  Since 
c'jr  rain  purpose  is  diagnosis,  :-:e   are  not  presently  concerned  with  what 
to  do  about  it,  except  to  note  that  the  more  collinear  the  data,  the 
more  prior  inforrrarion  needs  to  be  incorporated. 

Collinear it\''  diagnosis  is  experimental  toe ,  but  the  most  satisfactci'y 
treatment  we  ;<now  of  has  been  proposed  by  Ziavid  3elsley  [2],  who  builds 
on  earlier  work  of  Silvey  [3]."  B^y  exploiting  a  technique  of   n^jmerical 
analysts  called  the  singular  value  decomposition,  it  is  possible  to 
obtain  an   index  of  ill-conditioning  ar.d  relate  thi.s  to  a  decomposition 
of  the  estimated  coefficient  variances.  This  relation  enables  the 
investigator  to  locate  which  col'jmns  of  the  explanatory/  variable  matrix, 
associated  with  the   index  of  collinearity,  contribute  strongly  to  each 
coefficient  variance.  3y  thus  joining  Silvey 's  deccmposition  of  the 
covariance  n:H.trix  to  numierical  measures  of  ill-conditioring,  economists 
now  have  an  experinental  diagnostic  tool  that  enables  an  assessment  of  which 
ccl'XTns  cf  ~'r.e   data  marrix  are  crime  scurces  of  degradation  in  estima':3d 
coefficient  variances . 


1.3.3  Regression  ParajTieter  Vari:irilit>'  in  TJT.e 

A  third  major  goal  is  the  detection  of  systematic  parameter  variation 
in  time,  i-lany  statistical  models  assume  that  there  exist  constant  but 
imobservable  paramerers  to  be  estimated.  In  practice,  econometric ians 
often  find  this  assumption  invalid.  Suspicions  that  there  are  more  than 
one  set  of  popularion  param.eters  can  be  aroused  for  a  large  number  of 
reasons :  the  occarance  of  an  exter:ial  shock  tha.t  might  be  expected  to 
modify  behavior  significanxly  (a  war,  hyperinflation,  price-wage  controls, 
etc.)  is  one  possibility.  Another  is  that  a  poorly  specified  relation  might 
exclude  imporliant  variables  which  change  abruptly.  There  is  always  the 
possibility  that  aggregation  weights  [4]  nHy  change  over  time  and  thereby 
introduce  variability'  in  macro  parameters  even  when  micro  parameters  are  stable. 
An  argument  has  been  irade  by  Lucas  [23]  that  anticipated  changes  in  goverrjnent 
policy  will  cause  m.odifications  in  underlying  behavior.  Firially  the  parameters 
may  follow  a  random  process  and  thus  be  inherently  variable.  When  discrete 
changes  in  parameters  are  suspected,  and  the  sub-divisions  of  data  where  this 
occurs  is  identifiable  fron  outside  information,  the  analysis  of  covariance  in 
the  form  discussed  in  Gregory  Chow  [5]  or  Franklin  Fisher  [5]  is  an  appropriate 
diagnostic  that  has  been  frequently  applied.  When  the  break  point  of  points  have 
to  be  estimated,  maximum  likelihood  esrimators  proposed  by  Quandt  and  Goldfeld 
[7] [3]  are  available. 

.-n  alternative  diagnostic  procedure  has  recently  been  suggested  by 
Brown,  Durbin  and  Evans  [9].  They  have  designed  two  test  statistics  with 
a  time  series  orientation.  From  a  regression  formed  by  cumulatively  ad-lir^ 
new  obser'/ations  to  an  initial  subset  of  the  data,  one-step  ahead 
predictions  =re  rer.era~ed.   5orh  the  r-sscciaTed  c^Jiralated  recirsive  rez-V^.=  l: 


ar.i  Their  suns  of  squares  have  well-t-''haved  distributions  on  the  null 
h'.'Tothesis  of  naraiieter  ccnstancy. 

1.4-  Motation 

'.ve  use  The  fcllot-"Lig  notation: 


Population  Pegression 

Y  =  XB  +  £ 

Y  •"  nxl  col'jmn  veotcr  for  dependent  variable 
X  :  nxp  ratrix  of  explanatory/  variables 

B   '■   pxl  column  vector  of  regression  coefficient; 
e  :  nxl  colurn  error  vector 

Additional  notation 
•  th 


.2 


row  of  X  rratrix 


error  variance 


Estiiiated  Pegression 
Y  =  X3  +  r 

same 

same 

S    :   estiTiate  of  8 

r  :  residual  vector 


s  estimatec  error  variance 


g.-v  p  estimarec  witn  i- 

row  of  data  matri:<  and 
Y  vector  deleted. 


Other  notation  is  either  obvious  or  ">;ill  be  introduced  in  a  specific 
context  not  so  obviously  tied  to  the  generic  regression  Tcdel. 
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2.  LEVERAGE   POIIuS  Al'ffi  DISR^'ME  DATA 


2 , 1  Introduction 


At  this  stage  in  the  development  of  diagnostic  regression  procedures, 
we  turn  to  analysis  of  the  structure  of  the  X  matrix  through  perturbation  of  its 
rows.  In  the  usual  case,  the  X's  are  assur.ed  to  be  a  .T=.trix  of  fixed  n-rh-ers 
and  the  rrvatrix  to  have  full  col-orpn  rarx.  Other'/zise,  statistical  theory 
suggests  we  ought  to  have  little  interest  in  the  X  matrix,  except  when 
experimental  design  considerations  enter.  In  actual  practice,  reseajrchers 
pay  a  great  deal  of  attention  to  explanator'y  variables,  especially  in  initial 
investigatory  stages.  Even  when  data  are  experimentally  generated, 
peculiarities  in  the  data  can  impact  s'obsequent  analysis,  but  when  data 
are  non-experimental,  the  possibilities  for  unusual  data  to  influence 
estimation  is  typically  greater, 

To  be  more  precise,  one  is  often  concerned  that  subsets  of  the  data, 
i.e. ,  one  or  more  rows  of  the  X  matrix  and  associated  Y's  might  have  a 
disproportionate  influence  on  the  estimated  parameters  or  predictions. 
If,  for  example,  the  task  at  hand  is  estimating  the  mean  and  standard 
deviation  of  a  univariate  distribution,  exploration  of  the  data  will 
often  reveal  outliers,  skei-jness   or  multimodal  distributions.  Any  one  of 
these  might  cast  suspicion  on  the  data  or  the  appropriateness  of  the 
mean  and  standard  deviation  as  measures  of  location  and  variability. 
The  original  model  may  also  be  questioned  and  transforrrations  of  the 
original  data  consistent  with  an  alternative  mocel  rray  be  suggested,  for 
instance.  In  the  more  complicated  multiple  regression  contexl:,  it  is  common 
practice  to  look  at  the  'univariate  distribution  of  each  column  of  X  as  v.'ell 
as  Y,  -o  see  if  any  oddities  (outlier-  or  ^aps)  3tri-:e  the  eye.  Scatte. 
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diagrars  are   also  exaTip.ed.  V.^ile  -r.ere  are  clear  benefits  frcm  scrring 

detect  HT'jlti'.'ariate  discrepant  obser'/^.ticns.  That  weaJ-oiess  is  '-/hat  we 
hope  to  remedy. 

The  benefits  frcr.  isolating  S'lb-sers  of  rhe  dara  thax  might  disproportion- 
ately iTipact  the  esti.Tated  para.T.eters  are  cleai^,  but  the  sources  of 
discrepancy  are  diverse.  First,  theri  is  the  Lnevitable  cccurance  of 
i-nproperly  recorded  data ,  either  at  the  so'-roe  or  in  tr an. scrip ticn  to 
computer  readable  forrr..  Second,  obsei'^/ational  errors  are  often  inherent 
in  the  data.  V.Tiile  more  appropriate  estimation  procedures  than  least  squares 
ought  to  be  used,  the  diagnostics  we  propose  below  may  reveal  "he  lunsuspected 
existance  or  severity  of  observational  errors .  Third ,  outlying  data  points 
may  contain  valuable  information  that  will  improve  estimation  efficiency. 
Vie   all  seek  the  "cracial  experiment"  ,  '.-.hich  may  provide  indispensible 
in-fcrmation  and  its  ccunterv^art  can  be  incoroorated  in  non-experimental 
data.  E'/en  in  this  sitiuation,  however',  it  is  constructive  to  isolate 
extreme  points  that  indicare  how  much  the  param^eter  estimates  lean  on  these 
desirable  data.  Fourth,  patterns  may  emerge  from  the  data  that  lead  to 
a  reconsideration  and  alteration  of  the  initial  model  in  lieu  of  suppressing 
or  modifying  the  anomolous  data. 

Before  describing  multivariate  diagnostics,  a  brief  two  dim.ensional 
graphic  preview  will  indicate  what  sort  of  interesting  sit^aations  might 
be  subject  to  detection.  We  begin  by  an  examination  of  Figijre  1,  which 
portrays  the  ideal  null  case  of  'jnifonnly  distributed  and,  to  avoid  statistical 
connotations,  what  might  be  called  evenly  distributed  X.  If  the  variance  of 
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standard  test  statistics  contain  the  necessary  infcrrretion. 

In  Figure  2,  the  pc^int  o  is  ar.cmcdous ,  but  since  it  occurs  near  the 
mean  of  X,  no  adverse  leverage  effects  are  inflicted  on  the  slope  estijTate 
although  the  intercept  will  be  affected.  The  so'jrce  of  this  discrepant 
obser/ation  might  be  in  X,  Y  or  e.  If  the  latter,  it  could  be  indicative 
of  heteroscedasticity  or  thick-tailed  error  distributions ;  clearly  rrvcre 
such  points  are  needed  to  analyze  those  problems  further,  but  isolating 
the  single  point  is  constractive . 

Figure  3  illustrates  an  instance  of  leverage  where  a  gap  arises 
between  the  main  body  of  data  and  the  outlier.  While  it  constitutes  a 
disproportionate  amount  of  v/eight  in  the  determination  of  3,  it  might 
be  that  benign  third  source  of  leverage  mentioned  above  which  supplies 
crucially  useful  information.  Figure  M-  is  a  more  troublesome  configuration 
that  can  arise  in  practice.  In  this  situation  the  estimated  regression 
slope  is  almost  wholly  determined  by  the  extreme  point.  In  its  absence, 
the  slope  might  be  almost  anything.  Unless  the  extreme  point  is  a  crucial 
and  valid  piece  of  evidence  (which  of  course  depends  on  the  research 
context),  the  researcher  is  likely  to  be  highly  suspicious  of  the  estimate. 
Given  the  gap  and  configuration  of  the  main  body  of  data,  the  estimate 
surely  has  less  than  n-2  degrees  of  freedom:  in  fact  it  might  appear  that 
there  are  effectively  t/.vO  data  points  altogether,  not  n. 

Finally,  the  leverage  displayed  in  Figijre  5  is  a  potential  source  of 
concern  since  o  and/or  •  will  heavily  iuluence  3  but  differently  than  the 
remaining  data.  Here  is  a  case  where  deletion  of  data,  perhaps  less 
drastic  downweighting ,  or  model  reforr.ulaticn  is  clearly  indicated. 
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2.2  Residual  Diagnostics 

Traditionally  the  examinaticn  of  runctions  of  the  residuals, 
r.  =  y.  -  y.,  and  escecially  large  residuals,  has  been  used  ro  orovide 
indications  of  suspect  data  that  in  t'.jrn  niay  unduly  affect  regression 
results.  It  is  test  to  have  a  scalar'  covariance  matrix,  so  rhat 
detection  of  heteroscedasticity  or  autocorrelation  (and  later  on,  eliminating 
the.Ti)  is  desirable. 

Approximate  normality  is  another  desirable  property  in  te2ms  of  estimation 
efficiency  and  the  ability  to  test  hypotheses.  Harmful  departures  fron  normality 
include  pronounced  skewness,  multiple  m.odes  and  rhick-tailed  error  distributions. 
D/en  moderate  departures  from  normality  can  noticeably  im.pair  estimation 
efficiency.  At  the  sar.e  tim.e,  large  outliers  in  error  space  will  often  be 
associated  v;ith  m.cdest-sized  residuals  in  least  squares  estimates  since  zhe 
squared  error  ^criterion  heavily  weights  extreme  values. 

It  will  often  be  difficult  in  practice  to  distinguish  between 
heteroscedasticity  and  thick-tailed  error  distributions ;  to  observe  the 
former,  a  number  of  dependent  variable  values  must  be  associated  with 
(at  least)  several  given  configurations  of  explanatory  variables.  Othervv-ise, 
a  few  large  residual  outliers  could  have  been  generated  by  a  thick-tailed 
error  distribution  or  fragmients  from.  ^  heteroscedastic  distribution. 

Relevant  diagnostics  have  three  aspects,  tx^   of  wrJ.ch  examine  the 
residuals  and  the  third  involving  a  change  in  error  distribution  assumptions. 
The  first  is  sim^l^-  a  frequency  distribution  of  the  residuals.  If  there 
is  eviaen't  visual  skewness,  multiple  modes  or  a  heav^/  tailed  distribution, 
the  graph  vdll   pr>ove  infomative.  It  is  interesting  to  note  that  econorists 
o-'tsn  look  at  time  piers  of  residuals  but  saldcri  at  their  frecuencv  distrib^iticn. 
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The  second  is  the  normal  procarilit^,'  plot,  which  displays  the  cjziula- 
tive  normal  distribution  as  a  straight  line  whose  slope  Pleasures  the  star.dard 
deviation  and  whose  Lntercept  reflects  the  mean.  Thus  deparrures  from 
normality'  of  the  cjr.ulative  residual  plot  will  show  up  in  noticeable  departures 
fron  a  straight  line.  ^Outliers  will  appear  itimediately  at  either  end  of  the 
cumulative  distribution. 

Finally,  Denby  and  Fallows  [17]  and  Welsch  [18]  have  suggested  plotting 
the  estimated  coefficients  and  residuals  as  the  error  densiry  or,  equivalent ly , 
as  the  loss  f'unction  (negative  logarithm  of  the  density;  is  changed.  One 
family  of  loss  functions  has  beer,  suggested  by  K'uber  [1] ; 

/  ci t|  -c^    I t|>c 

which  goes  from  least-sc_uares  (c=»)  to  least  absolute  residuals  (c=0).  This 
approach  is  attractive  because  of  its  relaxion  to  robust  estiiration  [1] ,  but 
requires  considerable  computation. 

For  diagnostic  use  the  residuals  can  be  r.odified  in  v;ays  that  will 

enhance  our  abiliry  to  detect  problem  data,  've  first  note  that  the  r,. 

-p  -IT 
do  not  have  equal  variances  because  if  we  let  H  =  X(X"X)  X  ,  then 

E[(Y-Y)(Y-Y)-]  =  ZC(I-H)'/Y-(I-H)"] 

=  (I-K)  E(YY*)(I-H)  =  a^(I-H) 

since  (I-H)^  =  I-H  and  (I-H)X  =  0.   (See  Theil  [10]  and  Hoagiin  and  Welsch  [13] 
tor  a  more  detailed  discussion.)  Thus 


where  h.  is  the  i-^  diagonal  element  of  H. 
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Consequently  a  number  of  authors  [11]  have  suggested  that  instead 
of  studying  r^. ,  we  should  use  the  standardized  residuals 


^si  '  V^-^-'-i  (2.2.2) 

2  . 

wnere  s  is  tne  esturated  error  variance. 

For  diagnostic  p-jrposes  we  .Tight  ;vanT  to  go  further  and  ask 

about  the  size  of  the  residual  cor:-'^? ponding  to  y^   when  data  point  i  has 
been  omitted  from  the  fit,  since  this  corresponds  to  a  simple 

perturbation  of  the  data.  That  is,  we  base  the  fit  on  the  remaining 

n-1  data  points  and  then  predict  the  value  for  y^- .  Tnis  residual  is 

?i  -  Yi  -  X,  B^.^  (2.2.3) 

and  has  been  studied  in  a  different  context  by  Mien  [12].  Similarly 

St..  is  the  estimated  e2rror  variance  for  the  "not  i"  fit,  and  the 

standard  deviation  of  r.  is  estirrated  by  s.-Vl  +  x.  (X;  •  nX^  •  > )  *x.   . 
We  nov;  define  the  studentized  residual: 


,v      y.  -  X.  8.,. 

r.  =  i ^^-±=1 .  (2.2.4) 

1 


=(i)'^^^¥-W-'(i)^"'^i 


Since  the  numerator  and  denominator  in  (2.2.1+)  are  independent, 

r.  has  a  t  distribution  with  n-p-1  degreees  of  freedom.  Thus 

we  can  readily  assess  the  signi-ficance  of  any  single  sfadentized  residual. 

(Of  course,  r^.  and  r-  will  not  be  independent.)  Perhaps  even  more 

useful  for  cur  ^'irrcses  is  the  f=ct  ^'-at 


^\  -   VC^(i)^^)  ^-2.5) 
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ar.d 

(n-p-l)sj.^  =  (n-,)s^  -  j3^  (2.2.5) 

Tr.ese  results  are  ::rc'."ed  easily  by  using  the  rratrix  idenrities  ir.  Appendix  1. 

'^-c-.i'r2.5c^3  '.-."p  t'.irj'''  '^hat  -3.  ^c;cd  'y—.'j   to  ex3.T;jine  residuals  is 
to  look  at  the  s'uder.tized  residuals ,  he th  because  they  have  equal 
variar.ces  and  because  they  are  easil'/  relaxed  to  the  t-distributicn. 
Ho'.'.'e'.'er  ~h-is  dees  net  tell  the  vhcle  story,  since  scr,e  of  the  rrost 
influential  dara  points  can  ha'/e  relatively  sirall  studentized  residuals 
(and  very  STall  r,- ) . 

To  illustrate  "wath  the  simplest  case,  regression  thjrc-'jgh  the  origin,  we 

have 

r.  =  E  xf   ^^  (2.2.7) 

^   i^i  ^ 


L..  =  x,r./  Z  X?  (2.2.3) 


where  (i>  der.otes  r-n  estimate  obtained  by  removing  the  i-^  ix)w 

(data  point)  iron  the  computation.  Tr.us  the  residuals  are  related  to  the  . 

crar.ge  in  the  least-="uare  esTirate  caused  by  deleting  one  row.  But  each  ccnzains 

different  inxorriation  since  large  values  of  |S  -  S... |  can  be  associated 

with  srall   !r;[   and  vice  ver^a.  T?i^r^jfor*?  v/e  arc:  l^-nd  to  ^jonsider'  row 

deletion  as  an  iTpcrtanr  diagr.ostic  tcol,  to  be  treated  on  at  least  an 

equal  footing  with  the  ar.alysis  of  residuals. 


2  -  3^.^  =  (:■:*:■:)"-  x:r,/(l-h. )  (2.2.9) 


whera   the     h-     are  the  diagonal  eleir.ents  of     H,  the  least-squares 
projection  iratrix  defined  earlier,     '//e  v/ill  call  rhJ-S  rhe  ''har"'  marri:-:  since 

HY     =  Y  =  X3  .  (2.2.10) 

Clearly  the  har  xarrix  plays  a  crucial  role  not  only  in  the  studentizec 
residuals  but  also  in  row  deletion  and  other  diagnostic  tools.  We  now  develop  seme 
important  results  (based  on  the  discussion  in  Hoaglin  and  Welsch  [13])  relating  to 
this  rratrix. 

2.3  The  Hat  Matrix 

Geometrically  Y  is  the  projection  of  Y  onto  the  p-dimensional 
subspa.ce  of  n-soace  soanned  by  the  col'jjnns  of  X.  The  element  h.^  of  H 
has  a  direct  interpretation  as  the  amount  of  leverage  or  influence  exerted 
on  y_.  by  y- .  Thus  a  look  at  the  hat  matrix  can  reveal  sensitive  points 
in  the  X  space,  points  at  which  the  value  of  y  has  a  large  impact 
en  the  fit. 

The  influence  of  the  response  value  y-  on  the  fit  is  most  directly 
reflected  in  its  leverage  on  the  corresponding  fitted  value  y^. ,  and 
this  is  precisely  the  information  contained  in  h^- ,  the  corresponding 
diagonal  element  of  the  hat  matrix.  V/hen  there  are  two  or  fewer  explanatory/ 
variables  scatlrer  plots  will  quickly  reveal  any  x-cutliers ,  and  it  is 
net  hard  to  verify  that  they  have  relatively  large  h-  values.  'ATien 
p  >  2,  scatter  plots  may  not  reveal  "'r-ultivariate  outliers,"  which  are 
separated  Ln  p-space  from  the  buLk  of  the  x-coir.ts  but  do  not  appear  as 
outliers  in  a  plot  of  ar.y  single  exTl^nator-y  variable  or  pair  of  them 
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yet  •.-.-ill  ie  revealed  by  ar.  e>:aTina-*:i:.:i  of  H  .  Lcoking  at  the  diagonal 
elaT.ents  of  H  is  net  absolutely  co:.clusive  but  pix)vides  a  basic  iTarring 
point.  Even  if  there  ■.-■ere  no  hicder.  "ultivariare  outliers,  ccrr.puting 
and  exairdning  H  (especially  the  h^- )  is  usually  less  trouble  than 
lookir>g  at  all  possible  scatter  plots. 

9 

As  a  pro;:ecticn  .TaTri>:,     H     is  s>7rr.etric  and  ideir.pctent   (H~  =  H) . 
Thus  we  can  write 


n        9  -J  r, 

h. .   =  Z    h:.   =  h:.  +  Z    h:.  (2.3.1) 


^^     j=l     ^^         "    Pi 


1] 


and  it  is  clear  that  G  <  h,.  ,•  <   1.  Th.ese  linits  are  useful  in 
understanding  and  iterpretLng  h,.(=h,. ,-),  but  they  do  nor  yet  tell  us 
when  h-  is  "large"'.  It  is  easy  to  show,  however,  that  the  eigenvalues 
of  a  projection  matrix  are  either  0  or  1  and  that  the  nijmber  of  non-tero 
eigenvalues  is  equal  to  the  rank  of  the  ratrix.  In  this  case  rank  (K)  = 
rank  (X)  =  p  and  hence  trace  K  =  p,  that  is. 


Z  h.  =  D   .         -  (2.3.2) 

i=l  "   " 


The  average  size  of  a  diagonal  eleir.ent,  then,  is  p/n.  If  v/e  were  designing 

an  exoeriment  a  desirable  gcal  would  re  to  have  all  the  data  points  be  aboux 

equally  influential  or  all  h,-  nearly  equal.  SLnce  the  X  data  is  given 

to  us  and  we  canxio"*"  design  cur  experiment  to  keep  the  h,-  equal,  we  will  follow  [13] 

and  say  that  h,-  is  a  leverage  point  if  h-  >  2p/n.  we  shall  see  later  that 

leverage  points  can  be  both  harrnful  and  helpful. 
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The  quantity  2p/n  has  '.-rorked  well  in  practice  and  there  is  some 
theorexical  justification  for  its  use.  When  the  explaratony  variables  are 
multivariate  Gaussian  it  is  possible  to  compute  the  exact  distribution  of 
certain  fonc-cions  of   the  h^- .  Let  X  denote  the  nx(p-l)  ruatrix  obxained  by 
centering  the  e:<piar^tor2,'  variables.  Tlcw 

Y  -  Y  =  HY  -  Y  =  .^lY  (2.3.3) 

and  thus  the  diagonal  elements  of  the  centered  hat  matrix  are 

h.  =  h.  -  -  .  (2.3.U) 

1    in 

th  ^ 

Let  X,..  denote  X  with  the  i   row  removed  and  X/ • v  denote  the  centered 

version  of  X. . ^ ,  i.e.  means  based  on  all  but  the  i   observation  rave  teen 

subtracted  out.  Finally  note  that 

x.-x  =  —  (x.-x. .,)  (2.3.5) 

and 


Using  (Al.l  )  and  (2.3.5) 


h.=  -l- 
"i  1+Y 


Where  y  =  (^)  (x.-x^  • ))( xj^  X^^)'^   ^V^(i)) 


Again  using  (Al.l  )  and  (2.3.6) 


Y  =  (^)^    ^ 


n     T,(n-1) 

1+ — r —  a 

n 
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a  =   (:<i-x^.))    (XJ^  X^,,)        0<.-x^.^)*    . 


The  distribution  of  (n-2)a  is  well  rxcwn  sLice  it  is  the  Mahalanobis 
distap.ee  bef.-/eer.  obser^/aticn  1  and  rhe  rriean  of  the  reTaining  obser'/aticns 
[13,  p.    430].     Thus 

n(o-l) 


~(n-l)(n-p)  "p-_,n-p 
Reversing  the  above  algebraic  manipulations  we  obtain 
n-1 


h.  = 
1    n 


n-1 


+  a 


and 


,   _  (n-l)a  +  1 
i   (n-l)a  +  n 


Solving  for  a  gives 


and  from  (2.3.7) 


(2.3.7) 


h.-l/n     . 

_i _  n-1 

1-h .   "   n 


n-p   p-l,n-p 


(2.3.3) 


For  moderate  p  and  federate  n  the  95%  point  for  ?  is  near  2.  Therefore, 
a  cut-off  point  would  be 

n-p 

(2.3.9) 


h.  > 


2 (?-!)+ 


which  is  aporoxi^ated  bv  2o/n. 
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Fron  equation  (2.3.1)  we  can  see  that  whenever  h^-  =  0  or  1, 
we  have  h . .  =  0  for  all  j  ^  i .  Thes^  two  extreme  cases  can  be  interpreted 
as  follows.  If  h-  =  0,  then  y.  rmist  be  fixed  at  zero  -  it  is  not  affected 
by  y,.  -or  by  anv  other  v..  A  iX)int  with  x.  =  0  when  the  model  is  a 
straight  lL'^e  rbrcugh  the  origin  provides  a  simple  example. 

vVh.en  h-  =  1,  v/e  have  y,-  =  y,-  -  the  model  always  fits  thi.s  data 
value  exactly.  This  is  equivalenr  -c  saying  that, in  seme  coordinate 
system,  one  parameter  is  determined  ccm.pletely  by  y^^  or,  in  effect,  dedicated 
to  one  data  point .  The  following  theorems  are  proved  in  appendix  3 . 

Theorem:      If  h,.  =  1,  there  exists  a  nonsingular  transformation,  T  , 

such  that  the  least-squares  estimates  of  a  =  T   B  have  the  following 

.  P 
properties:  a,  =  y-  and  {a.}._„  do  not  depend  on  y-. 
-1.    1      J  J  -  z  1 


Theorem:      If  X  is  nonsingular,  then 


det(X^.,X, .,)  =  (1-h-)  det(X^X)   .  (2.3.10) 

(i)  (i)       1 


Clearly  when  h •  =  1  the  new  matrix  X. . .   formed  by  deleting  a  row  is  singular 
and  we  cannot  obtain  the  usual  least-squares  estimates.  -This  is  extreme 
leverage  and  does  not  often  occur  in  practice. 
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To  complete  o'jr  discussicn  oz  t'r.R  r.at  rratrix  ^e  give  a  fev;  sinple 

6X£jTipj.SS  •        tOV    tl..5    3SjT.pj.S    ru6an    3.a.^    Sj-— ~6r.~3    Cf       n      ax~3    -i-/' n     .        "oTS 

D  =  1  and  each  h ■  =  r/r. ,  the  rerfecrl"  balapiced  case. 
1   -       - 

For  a  straight  line  through  the  origin 


h.^  =  x.x./  I     X,;       •  (2.3.11) 

ij    -  J  j,_^  •- 

n 

and  clearly  Z   h •  =  p  =  1 . 
i=l 

Sirple  linear  regression  is  slightly  more  complicated  but  a  fev; 

steps  of  algebra  give 

(x-  -  x)(x.  -  x} 


\-^  =  ^  ^  — ^ ' (2.3.12) 


1]   n      n 


I    (X  -  x)^ 

k=l 


n  _ 

and  I  h  •  =  2 .  '/."e  can  see  from  (2.3.12)  how  x-values  far  frcm  x  '.sdll 

i=l  - 

lead  to  large  values  of  h,- .  It  is  this  idea  in  the  multivariate  case 

that  we  attempt  to  capt7.jre  by  looking  at  elements  of  the  hat  matrix. 

2.U  Row  deletion  Diagnostics 

V/e  now  return  to  The  basic  form:ula 


8  -  8(-j   =   (X";-:)   -  x^  r^/(l-h^).  (2.4.1) 

-      .  T     -1  % 

iince  the  variabxlity  of  3-   J-S  m.easured  by  s((a  X).r)    ,   a  more  useful  m.easiire 

of  change  is  .  ^ 

^""-(i)^^ 


S^  .  >  i'a"'"X)    ~.  . 
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where  we  have  replaced  3  tv  3 ,  ■  ^   is.  crcer  to  jTiaI<e  the  denominator  stochastically 
independent  of  the  nurr.erator  in  rhe  Gaussian  case.     To  pzx;vide  a 
s-jjirrery  of  the  relative  coefficient  changes  we  suggest 


fJDFBETAS.    =   \|  ^^-^       r     DFBETAS^.     .  (2.4.3) 

n-D 
The  term  — ^    rias  been  m.corpcrated  to  .Take  ^iDFBETAS  more  ccr.parahle  across 

data  sets  which  rray  have  different  values  of  p  and  n.     This  norrralizing 

value  vjas  chcsen  because  when  X  is  an  orthogonal  matrix  (but  not  necessarily 

orthonorrral ) 

X.  .   r. 

DFBFTAS . .    =     ^^^    ^^ 


t=l 


t3  1 


and 


7         h.         ,.  2 
Z  DFBETAS   .  =  t-^    r". 

Since  -the  average  value  of  h^.   =  p/n,  a  rough  average  value  for  h./(l-h^. ) 

is  p/(n-p).     Clearly  (2.i+.3)  could  be  modified  to  reflect  the  fact  t.hat 
some  coeff icie.nts  i-Tiay  be  more  important  thian  others  to  the  model  builder 
(e.g. ,   including  only  the  main  estimates  of  interesT), 

Another  obvious  row  deletion  diagnostic  is  the  change  in  fit 

h. 

DFFIT,   =  x.(S-3,.,)   =  T-^    V-    .  (2.4.4) 

i         1         (1)         1-h-        X 

If  we  scale  this  by  dividing  by  3 , .  s  ^"^  we  have 

'— ^    r'."  (2. -.5) 


1 
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Fcr  across  data  set  norr^alizaticn  we  ■■111  rnultiply  by  v'^~P/o  to  cbtair. 


A  measure  sinilar  to  rhis  has  been  sup^ested  by  Cook  [1^]. 

Clearly  DFrlTS  ar.c   i'lLFBETAS  agree  in  an  orthogonal  ccordirate  syster:. 
W'hen  crthogonaliry  does  not  hold  these  t^.-zo  meas-jres  provide  somewhiat  dirferent 
■inf onration .  Since  we  tend  to  enphjasize  coefficients, our  oreference  is  for 
>rDFBEr.AS. 

Deciding  when  a  difference  like  | (3  -  8/-n)-|  or  other  diagnostic 
statistic  is  large  will  depend ,  in  part ,  on  how  this  inf orrration  is  beir^ 
used.  For  exaiTtple,  large  changes  Ln  coefficients  that  are  net  of  particular 
interest  aight  not  overly  upset  the  Todel  builder  while  a  change  in  an 
important  coefficient  may  cause  considerable  concern  even  though  the  change 
is  small  relative  to  traditional  estiration  error. 

We  have  used  ttvO  approaches  to  measure  the  size  of  changes  caused,  by 
row  deletion.  The  first,  called  external  comparison,  generally  uses  Treasures 
associated  with  the  quantit^y  whose  changes  are  being  studied.  For  example, 
the  standard  error  of  a  particular  coefficient  3-  v^ould  be  used  with 

The  second  n^.ethcd,  called  interr.al  comparison,  treats  each  set  of 

diagnostic  values  (e.g.,  {(S  -  S,- ^)^  )!,•_-, )  as  a  single  data  series 

and  then  finds ,  for  exam,ple,  the  standard  deviation  of   trf.3  series  as 

a  measure  of  relative  size.  As  we  ha\-e  noted,  all  of  the  diagnostic  measures 

we  rave  discussed  so  far  are  functions  of  r.//l-h.  and  in  view  of  cur  discussion 

1    1 

of  sfudentized  residuals,   it  is  rarural  to  divide  thf.3  by  s (-,••,   to  achie'/e  a 
re^iScnaDxe  3ca_— ng  cezcre  .Ta-<^j^.g  p^cts,   c-^. 
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Once  3/-V  has  been  used,  the  ter.ptation  'arises  to  try  to  perform  forrral 
statistical  tests  because  v;e  know  the  distribution  of  r.  .  In  our  ooinion 
this  is  not  a  very  prc.TjLsin^  prcced'ur-e  because  it  puts  too  much  emphasis 
on  residuals  (although  looking  at  sr'udentized  residuals  is  better  than 
usir^g  the  raw  residuals).  We  prefer  to  use  external  or  internal 
comparison  to  :?a}:e   decisions  about  which  data  .points  dese2rve  f-orther 
attention  except,  of  course,  when  '.;e  ^re  locking  specifically  at  the 
studentized  residuals  as  we  did  earlier.  Using  any  Gaussian  distributional 
theory  depends  on  the  appropriateness  of  the  Russian  error  distribution  - 
a  topic  we  will  return  to  later. 

2. 5  Regression  Statistics 

Most  users  of  statistics  realize  that  estirrates  li]<e  B  shculd 

have  some  measure  of  variability  associated  with  them.  It  is  less 

2 

often  realized  that  regression  statistics  ILke  t,  R   and  F  should 

also  be  thought  of  as  having  a  variability/  associated  with  them. 

One  way  to  assess  this  variability  is  to  examine  the  effects  of  row 
deletion  on  these  regjressicn  sratistics.  V/e  have  focused  on  thxee: 


ATSTAT.  =   ^ ^^^Ar- 


.e.(B.)   s.e.(3..,). 
J         ^^y   J 


&FSTAT     =   FCall  3=0)-  "(^jCall  B  =  0) 


AR^     =   R^  -  R^ 


(i) 
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.A^air.  v.-e  should  ask  '.vher.  a  difference   is  large  enough  to  rr.erit  attention. 
For  exterr.al  cor.rarison  we  '.vculd  ccr.pare  to  the  standard  deviation  of 
t,   F,     or     R'': 

Statistic  Standard  Deviation 

./2 


A(n-D)^(n-2)       \l 
\t(n-p-2)(n-p-uy 


/2 


r2  C      '?-f        ^ 

\(p+n-2)    (p+n-1 


-J 


However,  v/e  tend  to  view  internal  comparison  as  more  appropriate  for 
regression  statistics. 

St^jcyin^  the  changes  in  regression  statistics  is  a   good  second  order 
diagnostic  t'OCl  because  if  a  row  appears  to  be  overly  influential  on 
other  grounds,  an  examination  of  the  regression  statistics  will  show 
if  the  conclusions  of  h//pothesis  testing  would  he  affected. 

There  is,  of  course,  room  for  misuse  of  this  proced'jre.  Data  points 
could  be  reitjoved  solely  on  the  basis  ~f  their  ability  (when  rem.oved)  to 
increase  ?""  cr  somie  other  measijre.  ''.^.ile  this  darker  exists  we  feel 
that  it  is  often  offset  by  the  abili~'  to  sfudy  changes  in  regression  statistic 
caused  by  row  deletion,  .-^ain  we  war.t  to  emphasize  thax  changes  in 
regression  s-atistics  should  not  be  used  as  a  primar'y  diagnostic  tool. 
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2.6  Irifluence  ■y.nd   Variarice  uecri::'-- 


We  r.cw  '..'CiLd  liks  to  consider  per^-irbing  our  ascimpticns  ir.  a  new 
way.  Consider  the  standard  regression  .T.odel  (m)  but  with  varCe,-) 
replaced  by  a"/w,.  for  just  the  i^^  data  point.  In  words,  we  are 
perturbing  the  hcmoscedasticity  assumption  for  this  one  data  point. 

In  appendix  2  we  she.';  th^at 


>Wi 


3w. 

1 


(X^X)  xj;r. 
(l-(l-w.)h^)" 


(2.5.1) 


and  it  xollov;s  th^t 


9w. 

1 


(X  a)  xir^ 


(2.5.2) 


3^- 

^ 

T     ""^  T 
(XX)     x.r. 

S-6,., 

9w. 

1  1 

(i) 

1 

w.=0 

1 

(1-h^)^ 

1-h. 

(2.5.3) 


Equation  (2.6.2)  tells  us  about  inf initesiiial  changes  in  8  caused  by  small 
changes  in  w-  about  the  value  1  and  similarly  for  (2.6.3).  From  the  mean  valu= 
theorem  we  kricw  that 


*(i) 


ow. 

1 


(2.6.4) 


v;here  5  is  between  0  and  1.  Any  one  of  (2.6.2),  (2.6.3)  or  (.2.6.^)   can  be  used 
fcr  diagnostic  p'.:rpcses.  '.-.'e  r.ave  chosen  to  emphasize  3-3^,- •)  because  of  its 
intuitive  appeal  and  the  fact  that  it  is  a  compromise  ber.-;een  (2.5.2)  ar.d  (2.5.: 
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Forrnula  (2.6.2)  can  also  be  consicereu  as  a  function  which  represents  the 
influeTice  of  the  i  '"^  data  point  and  .^  ui  be   linked  lo  rhe  theory  of  rocusr 
estimation  [15]  and  the  jackknife  [15]. 
If  we  let 

,2 


w^.   n-p  ^^-j  T  -^t  ^  w^ 


(2.5.5) 


and 


W  = 


(2.6.5) 


then  in  appendix  2  we  show  that 


^  r  s,^  (xHjx)' 


-i   (X^X)   -  3^  (X^X)   x:x.-(X"X) 
n-p  X  i 


(2.5.7) 


Since  we  would  like  to  remove  scale  we  define 


DBV/iJ^,.  = 


9  r  T  -1  ^     T  -1| 

i        F-)   x:x,(X-X)  J.j     (2.6.8) 


(n-p)s' 


(.X-X)' 
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as  the  scaled  inf  irLitesiral  change  in  zhe  variance  of  3^  ■     As  a  sumraary  measure 


over  all  of  the  coefficients  we  use 

D 


(2.6.9) 


wnere  the  r./z-   ~e; 


;."r.!?ar-z:;ilxty  across  cata  seis. 
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If  we  used  row  deletion  instead  of   deri'/'atives ,  our  basic  measure 
would  be 


2   T ,  "^    2    T      "^ 

3^   (XX)..   -  S,.XXt;y\,:.)^_^ 

DFBVARS..  =  ^ II  ^  /"^  ^-^  J^  (2.6.10) 

13 


with  summary  measure 

MDF3VMS.   =  ^''^       Z      IDFBV-APS..I  .     •         (2.5.11) 
1      P   ^=1        ^3 

Th^e  measures  so  far  discussed  in  this  section  include  both  the  explanatory 
variables  and  the  response.  If  we  wish  to  examine  the  X-matrix  only,  the  second, 
part  of  (2.5.3)  provides  a  good  way  to  do  thds.  We  notice  that 


n   rp  -1  rn    T  ~-      T  -1 
Z   (X'X)  x!_x.(X-X)    =  (XX) 
i=l 


and  define 


3ETAVRD 


[_(X^X)  ~x^x.(X^X) 


]] 


^^      (x^x)"t. 

wirh  summary  measure 

P 
NBETAVRD.   =   Z     5ETAVRD.. 
^     j=l       ^^ 

These  measures  provide  a  way  to  decompose  the  cross  products  matrix  with  respect 
to  the  individual  obser^/aticns . 

Again  it  is  useful  to  look  at  th-:;  orthogonal  X  case.  When  orthcgcPi^lity 
holds 


2 
X.  • 


"""■"'"^ij 


r 


and 


1     1 


Since  h-  has  a  strcrg  infji^i'/e  a.'r^-'e.^l  ir  may  be  a  betrer  sunrrar"/  value  even 
when  orrhogonali"ry  dees  not  hold.  We  'ra.\'e   chosen  not  to  nrultiply  iIBETA\''RD 
by  n/c  (the  avei^age  v^lue  for  h,- ) ,  sc  it  is  not  useful  across  data  sets. 

If  we  exanuLne  the  formula  for  DFBV.'VPS  we  see  th^ar  this  C'ualiry  could 
be  positive  or  negative.  As  ■.•;e  might  expect,  in  sorr.e  cases  dcv/nweighting  a 
data  point  can  i-prcve  cur  estimate  of  the  variance  of  a  coefficient.   (Dc-.n- 
weighting  corresponds  ro  placing  a  mir.us  sign  ir.   frcnr  of  DF3VARS.)  One  of 
-he  best  ways  to  exairiLne  the  tradeoffs  of  Dr BETAS  and  DFBVAJ'S  (or-  5ETAV?L) 
is  to  make  a  scatre^r  plot.  A  high  le'/erage  point  with  sirall  values  of  IFEETAS 
may  be  a  "good"  obser^/aticn  because  it  is  helpir.g  to  reduce  the  v.ariance  of 
certain  coefficients.  The  setting  aside  of  all  hj.gh  leverage  points  is 
generally  rot  an  efficient  procedure  because  it  fails  to  take  account  of  the 
response  data. 

2.7  More  Than  One  ?ov/  at  a  Time 

It  is  nar_ral  to  ask  if  there  right  be  groups  of  le^/erage  pcir^ts  r:-at- 
we  are  failing  to  ciagncse  because  we  are  only  locking  at  one  row  at  a  time. 
There  are  easily  constr'jcred  examples  where  this  can  happen. 

One  approach  is  to  proceed  sequentially  -  re-tcve  the  "worst"  leverage 
poirit  (based  perhaps  on  both  ilDFBETAS  and  IJBETAVED) ,  reexamine  the  diagnostic 
measures  and  rerove  the  next  "worst"  ^bser^/ation,  etc.  This  does  not  fully 
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cope  vdth  the  problan  of  groups  of  le\-erage  poLnts  and  just  as  stepwise 
regression  cari  te   troubleso-e ,  so  can  sequential  rev;  deletion. 
A  straightfor'.vard  induction  argi^T;enr  shows  that 


5. .  -  h. . (k, ,k„,. . .k  ) 
1]    1]  1  2'    t 


=  der  (I-K). 


i,k^ ,k^,. . .k^;j ,k,  ,k^,.. .k^ 
det  (I-H),   T 

where  H  is  the  hat  matrix  for  all  of  the  data,  h,.  .  (k  , . . .  ,k  )  denotes  the 

hat  matrix  for  a  regression  with  rows  k-,,...k  removed  and  the  subscripts  on  I-K 
denote  a  subraatrix  formed  by  taking  those  rows  and  coljmns  of  I-H. 

Even  though  all  of  these  differences  are  based  on  H,  multiple  row  deletion 
will  involve  large  amounts  of  ccmoutation.  It  is  instructive  to  note  that 


l-h^(k)  = 


(l_h.)(l-h^)  -  h^ 


-\ 


(1-h.) 


1  - 


(l-a^)(l-h. ) 


(1-h.)   [1  -cor  (r^,rj^.)] 


The  term  cor  C^^j^t,  )  also  appears  when  more  rows  are  deleted  and,  in  place  of 
looking  at  all  possible  subsets  of  rows,  an  examination  of  th;e  correlarion  matrix 
of  the  residuals  for  large  correlations  has  provided  useful  clues  to  groups  of 


increases  com.putational  cost  and  perhaps  storage  requirements. 
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2.3  Interface  '-vith  Rccusr  ar.d  ?-ldge  regression 

It  is  nafjral  to  ask  hcv;  the  abcve  diagnostics  could  or  should  be 
used  with  so.T.e  of  the  neu'er  estiration  r.ethods  iLke  robust  and  ridge  regression. 
The  first  question  is  whether  'we  should  do  diagnostics  or  robust  or  ridge 
first.  There  is  no  clear  answer,  but  scr.e  sort  of  iterative  proced'ore  is 
probably  called  for. 

However,  it  is  possible  to  perfcm  regression  diagnostics  after  using 
either  a  robust  proced'ure  or  a  ridge  proced'..jra.  In  the  robust  'Case  we  can 
nake  use  of  weights 


P' 


(Yi-x.S^) 


(2.3.1) 


where  P  is  the  robust  loss  function,  S^  are  the  robust  estimates  of  3  and 
s„  is  a  robust  estirrate  of  the  scale  of  the  residuals,  y.--:<-;  3^.   (A  complete 
discussion  of  weights  is  co.ntaLned  Ln  [20].)  We  new  ir-odify  the  data  by  for!?in.g 
a  diagonal  riatrix  of  weights,  W,  and  using  )^Y,  v'^"-:.  Thd.3  revised  data  is 
then  the  input  to  regression  diagnostics.  If  the  robust  estinaticn  procedure 
has  been  allowed  to  converge 

^        T   "-^  T 
B   =  iX\vj       x\y 
w 

will  be  close  to  Sr,  and  our  procedures  will  accurately  reflect  whjat  would  happen 

to  St^  locally.  Cf  course  they  do  not  reflect  what  '■-vould  happen  if  a  data  point 

were  deleted  and  then  robust  esti-naticn  applied. 

The  ridge  estimator  [21]  is  given  by 

„  -1  ^ 

Lr.     =  (X^XtkJ)  "  X^Y  .  (2.3.2) 

There  are  Tan'.'  ger.rralizaiions  but  .~cst  '.vill  fit  into  the  fcllcv/ing  frara- 
wcrk.     V.'e  ass'or.e  ^"at  k  r^^^s  been  chosen  b'.'  sor.e  .T.eans  S'uch  as  z'::Z3e  ILsz-zz. 
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in  C  21]  .  Then  we  f  orrr. 


X 


u^V 


where  G  is  a  y'-<l   '/ector  of  zeros  (prior  values  tiries  v^  in  more  general  cases) 
So  we  now  have  "new"  data  X,  and  Y.  with  nxp  rows.  Clearly 


RD 


T     -1   - 


(2.3.3) 


We  now  perform  regression  diagnostics  using  X  and  Y. .  VJhen  we  delete  a 
row  with  index  n+i  >  n,  it  is  equivalent  to  saying  we  do  not  want  to  "shrink" 
that  parair.eter  estiTate  toward  zero  (or  its  prior) .  In  the  Bayesian  context 
dropoing  such  a  row  is  like  setting  the  prior  precision  of  6-  to  zero. 
Plots  of  DFBETAS  would  then  show  the  effects  of  such  a  process  by  looking 
at  those  DFBET.'^^  values  for  index  greater  than  n. 

We  can  do  scjne  diagnostics  to  decide  if  a  ridge  estiiHtor  is  warranted. 
If  we  differentiate  (2.8.2)  with  respect  to  k,  then 


and 


3k 


T     -1 
(X"X  +kl)    6, 


RD 


=  (X^X)    3 


(2.8.4) 


(2.3.5) 


Thus  (2.8.5)  provides  information  about  infinitesiial  charges  about  k=0. 

T 
If  X  X  were  diagonal  then  (2.8.5)  has  cairoonents  3 -/A.  where  X.   are  the 

eigenvalues.   So  3^  lai^e  and/or  A^  snail  would  lead  to  a  large  value  of 
J  J 

the  derivative.  Since  the  ridge  estlT^tor  depends  heavily  on  the  scaling 


Ices   (2.8.4)  es.z.   we  reccimen 


using  this  diagnostic  measure. 
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V/h.en  dia.gr.osrics  'ru3.ve  been  car.ple-ec  a  few  observ-aticns  rray  be  suspect. 
The  rows  can  rhen  be  set  aside  and  a  new  i-obust  or  ridge  estirrate  ccinputed. 
Diagnostics  can  then  be  applied  again.     T.-ere  are  obvious  liTO-ts  of  time  and 
money  but  we  think  that  two  passes  through  thiis  process  will  ofteri  be  ;>?orth- 
while . 
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2.9  An  ExaT.ple:  ri\   Inter-Country  Life  Oycle  Savings  Function 

Arlie  Sterling  of  MT  has  made  available  to  us  data  he  has 
collected  on  fifty  countries  in  order  to  undertake  a  cross-sectional 
study  of  the  life  cycle  saving  hv'pothesis.  The  savings  ratio 
(aggregate  personal  saving  divided  by  disposal  income)  is  e:<plained 
by  per  capita  disposable  income,  the  percentage  rate  of  change  in  per 
capita  disposable  inccm^e  and  rwo  population  variables :  per  cent  less 
that  15  years  old  ar;d  per  cent  over  75  years  old.  The  data  are  averaged 
over  the  decade  196C-197Q  to  ramove  the  business  cycle  or  other  short-teiri 
fluct^jations. 

Accoirding  to  the  life  cycle  hypothesis,  savings  rates  should  be 
negatively  affected  if  non-members  of   the  labor  force  constitute  a  large 
part  of  the  population.  Income  is  not  expected  to  be  important  since 
age  distribution  and  the  rate  of  inccne  growth  constitijte  the  core  of 
life  cycle  savings  behavior.  Tiie  regression  equation  and  variable 
definitions  are   then: 


SF^  =  COEF.l  +  COEF.  2"POF15^  +  CCEF.3-'-POF75^  +  COEF.  4='=  INC . 


+  COEF.  5=-INGR0^  (2.9.1) 


-37- 


SR.      =  the  average  ag£re-^-e  personal  savir.Fs  rate  in 
co'ir.try  i  '"rrAii  lr'>1270 

PCP15.    -     the  average  'i  of  z':.-^   pcpulaticn  '^-cer  15  years 
of  age  frcn  IScQ-lr"'] 

?0?75,-    =  the  average  %  cf   th=  copulation  over  "^S   years 
of  age  from  1350-19"? 

mC-  =  the  average  level  of  real  per  oapita  disposable 

income  in  co'jnrr-/  i  fvcrr,   1960-1970  measured  iri 
U.S.  dollars 

IrlGRO-;    =  the  a\-eraee  ^3  grov.-h  rate  cf  BIC^.  from 
1963-19707    "^  -^ 


A  full  list  of  countries,  together  with  their  numerical  designation, 
appear'S  in  Exhibit  1,  arid  the  data  ar"  in  Exhibit  2.  It  is  evident  that 
a  wide  geographic  area  ar.d   span  of  economic  development  are  included.  It  is 
also  plausible  to  suppose  that  the  qualiry  of  the  ;inderl:/ir.g  data  is 

highly  variable,  '.vith  these  obvious  ::aveats,  the  13  estirrates  or  (2.9.1) 

2  •        .       .... 

are  showr.  m  Exhj.bit  3.  To  comment  bi'ierly,  the  R  is  not  'ar:criaracteri3tica:..'.j 

low  for  cross-sections ,  the  pcpulaticn  variables  have   correct  negative  signs  - 
CGEF  3  has  a  small  t  statistic  but  CCFF  2  does  not  -  incom.e  is  statistically 
insignif icar.t ,  while  income  growth  reflected  in  COEF  5  is  signiricant  at 
the  5  per  cent  level  and  has  a  positive  influence  on  the  savings  rata 
as  it  should.  Broadly  speaking,  these  results  az^e  consistent  with  the 
life  cycle  h'/pcthesis. 

The  rorainder  of  this  section  will  be  a  giaided  tour  through  som.e 

of  the  diagnostics  discussed  previously.  The  computations  were  performed 
using  SSTSSYS  (acronym  for  sensitivity-  systemj ,  a  TRCLL  experimental  subsystem 
for  regression  diagnostics.  Crrhcgcnal  decompositions  are  used  in  the 

of  the  diagnostic  m.eas'ures  in  addition  to  tne  usual  LS   results  in  less  tran 
twice  the  ccmcuter  tim.e  for  the  LS  results  alone. 
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Tavid  Jones  and  Steve  Peters  of  the  "."dEF.  Corriputer  Research  Center  have 
prcgrarrr.ed  SEi.'SSYS.  Both  hav3  acti'.-ely  participated  Ln  anal:,^ical  and 
empirical  asp'^cts  of  the  research. 

Only  a  selection  of  plots  and  diagnostics  v;ill  be  shov.Ti  for  two  reasons. 
One  is  that  to  provide  the  full  cattery  of  plots  would  be  excessively  tedious; 
however,  the  irassing  plots  and  tables  are  readily  obtainable.  The  other 
reason  is  that  we  foijr.d  these  diagr.ostics  to  be  ainong  the  irore  instr'uctive 
from  examiration  of  thj.s  and  several  other  prcblaTiS. 

2.9.1  Residuals 

The  first  plot,  Exhibit  4,  is  a  normal  probability  plot.  Departure  from 
a  fitted  line  (which  represents  a  particular  (3aussian  distribution  with  mean 
equal  to  the  intercept  and  standard  deviation  equal  to  the  slope)  is  not  sub- 
stantial in  the  main  body  of  the  data  for  these  studentized  residuals,  but 
Zam±)ia  (46)  is  an  ejctrame  residual  which  departs  fran  the  line.  Different 
information,  an  index  plot  of  the  r,- ,  appears  in  Exhibit  5  which  reveals  not 
only  Zambia,  but  possibly  Chile  (7)  as  well  to  be  an  outlier;  each  exceeds 
2.5  times  the  standard  error. 

2.9.2  Leverage  and  Diagonal  riat   '-'atrix  Entries 

Exhibit  6  plots  the  h^  which,  as  diagnonals  of  the  hat  nstrix,  are  indicative 
of  leverage  points.  Most  of  the  h.  are  small,  but  two  stand  out  sharply:  Libya 
(49)  and  the  uhiited  States  (^u).  T-/.0  others,  Japan  (23)  and  Ireland  (21)  exceed 
the  2p/n  =  .20  criterion  (wrdch  rapf.ens  to  be  equal  to  the  95^  significance  level 
based  on  the  F  distribution),  but  just  barely.  Deciding  whether  or  net  leverage 
is  potentially  detrimental  depends  on  what  happens  elsewhere  in  the  diagnostic 
analysis,  altnougn  i~  sr.cj^c   ze  recalled  ^hat  It  is  '/alues  near  '^r.Lz'/   ~ha~  "'-^ 
the  most  severe  problems,  v.hich  has  .;ot  rappened  iiere. 
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2.9.3  Coeff icieriT  Per~-^!:^~icn 

.■^n  over'/iew  of  the  effecTS  cf  individual  rcw  deletion  (see  E>±iibit  7) 
is  'Z3.se-i   en  (2.4.3)  :iZ:3EI--5 ,   the  ^v^are  root  of  the  scaled  s-om  of  the  squarec 
differences  zer.veen  the  full  oata  set  and  row  deleted  ocef f icients .  The  measure 
usee  is  szalec   apprcxl-tately  as  the  t  distribution  so  that  values  greatei'^  thiari  1 
are  a  pctential  source  cf   concern.  T'.vO  countries  trat  also  snowed  up  as  possiils 
hi-gh  leverage  ca.ndidates,  Libya  (^5)  ar.d  Japan  (23),  also  seem  to  have  a  hea'.y 
influence  on  the  coefficients  vvhile  Ireland  (21),  a  rrarginal  high  leverage  can.dic 
is  also  a  margLnal  candidate  for  influencing  coefficient  b^ehavior.  Individual  p], 
of  DFBET.^^  ( 2  .  Lf .  2 )  follow  next ,  fron  which  the  f ollowLng  table  has  been  construct 
based  on  an  e:<a':iination  of  Exhd.bits  :-ll. 

Noticeably  Large  Effects  on  B::  from  ?cw  Leleticn 


■^oo'u_ation  <_c     .-"ooulation  >75      Licome       Ir.come  Grc^^rth 


Japan  (23)        Ireland  (21)  Libya  (49) 

Japan  (23)  Japan  (23) 

The  co'jntries  that  stand  out  Ln  the  individual  coefficients  are  pernaps, 
.not  surprisingly,  the  t.-jo   that  appeared  in  the  overall  measure.  Ireland,  in 
addition,  appears  once.  Except  on  the  income  variable,  the  comparatively  large 
values  are  just  about  one  LS  standard  error  for  each  oarticular  coefficient. 


2.9.i4  Variation  in  Coefficient  Standard  E: 


-rrors 


Exhibit  12  is  a  su-imary  meas'ure  of  coefficient  standard  error  variations 

as  a  consequence  of   row  deletions,  designated  as  MDFBVARS  in  (2.6.9).  Since 

„  -1 


..i:ie  ata. 
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iar-ge  values  indicate  simulxanacus  or  individual  extreraes  in  residuals  cr 
aiulrivariaxe  outliers  in  the  X  ratrix.     These  q'oite  n'JTierous  candidates 
iriclude : 


Index 


7 

Chile 

21 

Ireland 

23 

Japan 

37 

Southern  Phodesia 

1+4 

United  States 

1+6 

Zambia 

49 

Libya 

Of  these  seven  countries,  3i:<  appeared  previously,  while  the  only  new 
candidate  is  Southern  Phodesia.  Libya  had  both  high  leverage  and  large 
coefficient  changes ,  Ireland  and  Japan  had  noticeable  coefficient  changes , 
while  Cl'iile  and  Zanbia  possess  large  residuals.  Thus  this  particular 

diagnostic  may  have  some  use  as  a  comprehensive  m.easure. 

Plots  for  percent  changes  in  the  Individual  coefficient  standard 

errors"  are  5hc\,-ffi  in  E:<hibits  13-15.  Large  individual  changes  (here  taJ-:en  to 

be  in  excess  of  25'i)  appear  for  the  United  States  '^ith  a  Lt7%  change  for 

the  income  variable,  while  the  deletion  of  Libya  increases  the  standard  error 

for  the  sam.e  variable  by  nearly  35%. 

2.9.5  Change  in  Fit 

The  standaixiized  change  In  fit,  DFFITS  (2.4.5),  with  a  row  deleted, 
while  similar  in  algebraic  striuct^ure  to  coefficient  char.ge,  conveys  scriewhat 
different  information  of   general  interest  with  specific  applications  in  a  time 
series  conte:-rt.  DFFITS  can  be  viewed  in  some  theoretical  cases  as  having  a 

In  Exhibit  17  th-ree  cour;tries  that  su^^faced  previously  reappear: 
JapaPi  (23),  Zambia  (46)  ar.d.   Libya  (4'?''.  'A'hen  coefficient  changes 
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alone  are  considered  as  shov.n  ir;  Ix^".ibiT  7,  Zainbia  did  not  appear,  while 
j.r'ej.an*-.  C<-x/  C2.c   j.nus  SGri"i&'//nat  cxrreren"ti  mz'ci'Tro.Tj.on  z.s  lonxaiTied  Ui  eacn. 


It  is  now  desirable  to  bring  rcge-cher  the  inforraticn  that  has  been 
asser-bled  thus  bar,  t?  see  vbiat  it  all  adds  up  to.  '>.e  useful  S'jrrrar'y   plot 
is  sho-wTi  Ln  Zxhibir  IS,  which  oIcts  the  suirrar"/  r.easure  of  S  -  B,-s,  IDFBEIJ'S 
against  the  correspondir^  hat  rrarrix  diagonal,  h,- . 

The  first  point  wrd.ch  enierges  is  that  Japan  (23)  and  Libya  (-9)  'r£.ve 
both  hd.gh  le'/erage  and  a  sign:if icanr  Lnf luence  on  the  estirated  parameters . 
This  is  reason  enough  to  view  then  as  serious  problens.   (.-^.fter  the  analysis 
had  reached  this  point,  we  v/ere  infcmed  by  Arlie  Sterling  zhat  a  data  error 
had  been  discovered  for  Japani.  '.Jhen  ccrrecred,  he  tells  us  trat  the  revised 
data  is  more  simdlar  to  rhe  r.ajcri"cy  of  countries.  T^-.ese  diagnostics  r^iwe 
thus  "proveri  their  v;orth"  in  'oac   data  detection  in  a  mccest  way.  Second, 
Ireland  is  an  in-betrween  case,  with  moderately  large  leverage  and  a  scjriewrat 
disproportionate  impact  on  tha  coefficient  estimates, 

Trird,  the  'jnjLted  States  has  hi,;:;  leverage  ccmbiir.ed  with  only  m.eager 
differential  effect  on  the  estimated  coefficients.  Thus  leverage  in  this 
instance  can  be  viewed  as  neutral  or  beneficial.  It  is  important  to  note 
that  not  all  leverage  points  cause  large  changes  in  3. 

Exhibit  19  plots  the  summary  of  coefficient  change,  hlDFBETAS   agair.st 
the  studentized  residuals  and  vis'ually  drives  hom.e  the  point  that  large 
residuals  do  not  necessarily  coincide  with  large  changes  in  coefficients;  all  of 
the  large  changes  in   coefficients  are  associated  v/ith  standardized  residuals 
less  than  2.  Thus  residual  analysis  alone  is  net  a  sufficient  diagnostic  tool. 
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Ar.other  suinrary  plot,  that  of  carige  in  coefficient  standard  error, 
liDFBVAPS  a'^sinst  leveri<^e  as  rr.eas'jr"^d  by  h.  in  E;-ihibit  20  indicates  th.e 
close  anticipated  association  ceti-.'een  leverage  and  estirrated  paraT.eter 
variability.  This  is  clearly  shoun  by  the  diagonal  line  composed  of  (21)  Ireland, 
(23)  Japan,  i'-^^)   Unired  States  arid  (^9)  Libya.  Put  residuals  also  can  have 
a  large  and  separare  influence,  as  evidenced  by   the  low  leverage,  high 
standard  error  crar.ges  for  (7)  C;.;l:-  end  (1^6)  Zambia. 

A  final  S'dirrpary  plot,  EMnibit  21  of  'DFBETAS  agaLnst  ;DFEV/^i?.S,  is  revealing 
in  that  all  of  the  points  noted  outside  the  cutoff  points  (3,2)  have  been 
spotted  in  the  previous  diagnostics  as  worth  another  look  for  one  reason  or 
another.  Thus  about  15%  of  rhe  observations  have  been  flagged,  not  ar. 
excessive  fraction  for  many  data  sets. 

2,9.7  One  Fijrther  Step 

Since  Libya  (49)  is  clearly  axi  e>ctreme  and  probably  deleterious  influence 
on  the  origLnal  regression,  a  reasonable  next  step  is  to  elimnate  iz   to  find 
out  whether  its  presence  has  masked  other  problems  or  not.  Exhibit  22  plots 
the  h.-  when  Libya  (49)  has  been  excluded  Ln  the  data  set.  There  is  only  one 
noticeable  difference  since  Ireland  (21),  Japan  (23)  and  the  United  States  (^^) 
remain  high  leverage  points.  Southern  Rhodesia  (37)  now  appears  as  a 
rrargira.lly  significant  leverage  point,  v;hereas  it  had  previously  been  just 
below  the  cutoff.  The  only  really  new  fact  is  that  Jamaica  (4-7)  now  appears 
as  a  prominen-  leverage  point. 

Jamaica  has  f'uirthermore  now  become  a  source  of  parameter  influence  which 
is  perhaps  mosr  effectively  obser^-'ec  in  the  recalculation  of  scaled  parameter 
changes,  MDFBETAS,  in  Exhibit  23  which  reveals  Jamaica  as  the  sLngle  largest 
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This  illustrates  the  proposition  that  per'/erse  ■sxtreme  points  can  mask  the 
iiT.'^vact  or  stzll  other  oer'/srse  tcir.ts.  Yet  the  original  anaivsis  did 
ccnta±r.  .-nC'St  zf   the  pertiner.t  irjforraticn  atcut  exceptional  data  hera.vior. 
The  correlation  .tatrix  of  the  residuals  discussed  in  Section  2 . 7  provided 
a  clue,  since  the  squared  correlaticn  her.-7een  (^'^)   and  (^^9)  was  .173, 
the  hightest  val^e.   It  is  nevertheless  a  pr'udent  step  to  reanalyze  the  data 
v/ith  suspect  poi.nts  re.T.oved,  to  is certain  whether  3ne  cr  more  extreme  tr 
suspect  data  points  have  obscured  or  danLnated  others. 

2.10  Final  Ccnments 

The  question  naturally  arises  as  to  whether  the  approach  v/e  have  taJ<en 
in  detection  of  outliers  is  .tore  effective  than  sinply  examining  each 
individual  column  of  the  data  to  look  for  detached  obser'/ations.  We  believe 
the  answer  is  yes.  Detached  outliers  did  appear  in  column  5  (CJGRO)  of  the 
X  matrix  for  Libya  (4-9)  and  Jamaica  (4-7),  but  not  elsewhere.  Libya,  of 
course,  was  "the  villain  cf  the  piece"  in  the  prior  analysis.  But  leverage 
points  for  numerous  other  countries  were  revealed  by  row  deletion  diagnostics, 
while  Jamaica,  as  matters  rurned  out,  was  not  a  particularly  troublesom.e  -data 
point.  In  addition  we  discussed  how  varicus  leverage  points  affected  oui'^ 
output  -  coef f iciants ,  fit,  or  both.  So  we  conclude  at  this  early  stage  of 
our  investigation,  th.at  these  new  proced'ures  nave  merit  in  uncoveririg  'discrepant 
data  that  is  not  possible  with  a  high  degree  of  confidence  by  just  looking  at 
the  raw  data. 


.1+1+- 


Rererer.ces 

[1]  Huber,  ?.J.,  "Rcbusr  r.egrezziorr.     A3%Tnotctics ,  Conjscl?jres  ar.d  Mcnte  Carlo," 
.V'u-^als  cf  S-airistics,  i  (1973),  pp."  799-821. 

[2]  Belsley,  Zavid  A.,  "^^JlricollL'-:eari~.• :  Diagnosing  its  Presence  and  Assessing 
the  Fotenrial  Danage  it  Causes  Least-Squares  IstiTiiTion,"  Working  Paper  154, 
Naricnal  Bureau  cf   Iccnciriic  Research,  Ccrr.pu;er  Research  Center  for  Economics 
and  yanagar^ent  Science,  October  15'^5,  iCamfcridge,  Mass. 

[3]  Silvey,  S.D.  ,  ".'■?ulticcllinearirv'  and  L-^rrecise  Estirraticn , "  Journal  of  the 
Royal  Sta-isTJcal  Sccier/,  Series  3,' Vol.  31,  1959,  pp.  533-552. 

[4]  Theil,  H.  ,  LLnear  A.ggregaricn  cf  Rcononic  Relazions,  Xorth- Holland,  AT^terdan , 
195^^. 

[5]  C-iow,  Gregory.'  C.  ,  "Tests  of  Equality  Between  Sets  of  Coefficients  in  T'.-ro  Linear 
Regressions,"  Eccncrrerrica,  Vol.  28,  196 G,  pp.  591-605. 

[6]  Fisher,  FrarJ-iLi-  M. ,  "Tes-s  of  Equality  3etv/een  Sets  of  Coefficients  in  Tv?o 
Linear  Regressions:  An  Expositor^y  Note,"  Econometrica ,  Vol.  38,  1970, 
pp.  361-355. 

[7]  Goldfeld,  Stephen  M.  and  Richard  E.  Cuandt,  "Thie  Estimation  of  Strucfjral 

Shifts  by  Swirching  Regression,"  Annals  of  Economic  and  Social  Measurgner.t, 
Vol.  2,  No.  u,  1973,  pp.  U75-ii85. 

[3]  'Q-uandt,  Richiard  E. ,  "A  N'ew  Approach  to  Estimating  Switching  Regressions," 

Jourr.al  of  the  ^'jnerican  Statistical  Association,  Vol.  57,  1972,  pp.  305-310. 

[9]  Brown,  R.L. ,  J.  Curtin  and  J.M.  Evans,  "Techniques  for  Testing  the  Constancy 
of  Regression  Relationshiios,"  Jcurral  of  the  Royal  Statistical  Sooiet^/, 
Series  3,  Vol.  37,  No.  2,"  1975,  pp.  li+9-lS3. 

[10]  Theil,  H. ,  Principles  of  Econcmetrics ,  John  Wiley  and  Sons,  New  York,  1971, 
pp.  193-195. 

[11]  ."Xisccmbe,  F.J.  and  T^'ukey,  J.W. ,  "The  E:<amination  and  .Analysis  of  Residuals," 
Techno.T.e tries ,  5  (1953),  pp.  mi-loO. 

[12]  AJLlen,  David  M. ,  "The  Relationship  Ber^-een  Variable  Selection  and  Data 

Augmenration  arid  a  Method  for  Prediction,"  Technor.e tries ,  16  (197'+),  pp. 
125-127. 

[13]  Hoaglin,  D.C.  and  Welsch,  R.E.,  "The  Hat  Matrix  in  Regression  and  Ajiova," 

Memorandum  N5-341,  Department  of  Statistics,  I-{arvard  University,  December  1976, 

[14]  Cook,  R.D.,  "Detection  of  Influential  Obser'/aticns  in  Linear  Regression," 
Techjiorietrics ,  19  (1977),  pp.  15-13. 

L-i-5j  .'iallcvs ,  r._.  .  "Cn  Scrr.e  Tccics  in  RcbusTr.ess."  Pacer  delivered  at  The  Zasiem 


-45- 


',  R.3.,  "The  CacKkrlfe:     A  .-ev'iew,"  BioaerriJ-:^ ,  61  (15"4),  pp.  1-1: 


[17]  Z^^iPJ:.y ,   L.  iTiC  Maj-lows,  C.L.,  "Tvo  ^'ia£ 

Ar^alysis,"  Techncr.e tries ,  19  (1277),  pp.  l-l^^. 

[IS]  V.'elsch,  R.Z.,  "Graphics  for  Data  .Analysis,"  Ccrr^puters  and  Graphics,  2  (1976), 
pp.  31-57. 

[19]  ?ao,  C.?..  ,  Lir^ear  Statistical  Lnference  ard   It-  Applications,  Mew  York: 
V/i  1  e  V  1965. 

[20]  '.v'elsch,  ?..Z.  ,  "Confidence  Regions  for  Robust  Regression,"  1975  Proceedir.gs 
of  the  A£A  Statistical  r:~.outir.;  iecticn,  //ashingtcn ,  D . C . :  A..ner ioan 
Statistical  Asscoiaticn. 

[21]  Koerl,  A.E.  and  R.w.  Kennard,  "Ri.dga  Regression:  Biased  Zstitiation  for  ^:on- 
ort.hcgonal  Proclets,"  Techncmetrics  12  (197C),  pp.  55-67. 

[22]  Holland,  P.W. ,  "Weighted  Ridge  Regression:  Combining  Rddge  and  Robust  Regression 
Methods,"  W?ll,  ^.'3ER  Conputer  Research  Center,  Cambridge,  :-!ass. 

[23]  Lucas,  Rcb-ers,  "Che  ?hi.llips  Curve  ar:d  Labor  Markets",  Camegis-Rochester 

Coriierence  Series  on  Public  Policy  (editors  3r inner  and  Maltzer)  Supple- 
mentary/ Series  to  7ne  Jcimal  cf   Monetary  Econorrics,  .'forth  Holland  FublisItLng 
'Co.,  197S. 


Al.l 


Appendix  1.  BASIC   DI?FE?£:;C£  rO?l-::U-S 

The  fundar^ertal  difference  forrrjias  are  known  as  the  SherTiHn- 
iMorriscn-WoodbuT'/  Theorem  [19,  p.  29]. 

^  -1  ^    ^  -1 
-1      T  -1   CX  X)  x:x.(X'X) 
(X^T^X^,^)   =  (X-0  '-T=K—^ ^Al.l) 

T^      -"'  T     T      ~^ 

(X^X)    =  (XJ.X,.,)    -    ^'-^    ^^^     ^,  ^  ^   ^r^^^^ .    (.A1.2) 

^^  ^^  l-x.(X  ^  X   )"  y-". 

1  x^^A^^y.^^^;   x^ 

From  this  comes 


T  ~1  T 
(X^X)  X.  r^ 


'(i)       1-h^ 


and  since 


we  get 


(A1.3) 


(n-p-1)  s^.^  =  ^l^  ^yt'^^t  e(i))" 

t?^i 


2     "        ^-i^.-       ^^• 
(n-p-1)  s,:,  =  L     (r^  +  r^^) T 


2 

-  2r.    n        r.     n  ^ 

=  (n-p)s^+  ^-^  J,   r  h  •  +/.^,  o  J,  h".  - 

^    l-n-  t=l  t  ti  Q-n.)^  t=l  zx 


2         ^? 

=  (n-D)  s  -  -T-^ 
l-n. 


A1.2 


rmaiiy  '.•.•e  octam 


-     -1 


(n-p)   s^iX'X) 


(n-p-1)   5^.)    (>:(^y';(i)) 


-I 


h-      ^-'(D-'ci) 


)       -   (n-D)-'- 


T       -1    T  T 

(rX)     x::<.(X.': 


r:o 


.  (.^.^) 


a:.i 


Appendix  2.   DIFFERET^ITLATICN  FC?J'!l"LA^ 


Let 


W  = 


(A2.1) 


and 


-1 


^.  =  (X^'/K)   X"V/Y, 


(A2 . 2 ) 


from  (AJ..1)  we  obtain 


(xVx) 


(X^X)  ~  + 


rn  ""i   i"*"^        r-p     ^X 

(l-w.)(X*X)  xtx^CX'X) 
l-(l-w^)h^ 


(A2.3) 


and  then 


'^  T 

^  (X-i-^) 

dW. 


rp     _]_  rr^        T     -1 

•(X^X)  x^x^CX^X) 


(AJ!.4) 


Some  algebraic  .Tanipularion  using  (A2.2)  and  (A2.3)  gives 


.         -     -1  -         (l-w.) 

%   -   3  -^;<-;^)     -I-i  l-(l-w.)h. 
-  11 


(A2.5) 


where  3  and  r-  are  the  least- squares  estirrates  obtained  when  w^-=l.  Thus 


36w- 
3w. 


,,  -1 

(x-x) 


(l-(l-w.)h.)' 


(A2.5) 


A2.2 

cr  ecuivalently  (again  using  A2.3)   . 

=  (x\'X)"^x^  (y,--x.  l,   ).  (.A2.7) 

It  is  also  usef-jJL  to  look  at  the  sGuared  residual  error 


n  .2 


^SP,.,     =  ^l,    w     (>•.-:■:_  &^-.)      .  (A2.8) 


Jsing   (A2.7)  '.ve  have 


5SSR,..  n  .  1     ^  . 

^rr^    =  -:  ^i.  "^  (y\-\  ^.v,)  x^  (x'v-a)       xt(y.-x.   3„ . ) 


^^i-^i   ^w.^ 


(v.-x-    "      ) 

■2  — ^ ^ 


^i^        t  ^t    r    --1        t  t  11 


^  ^yi-^il.^'- 


(A2.9) 


For  the  data  y'v  Y  and  v'^  X 

-1 


H^ 


=     v^  XiX^vlX)       X'  /W  and 


^  ^w  =   °- 


This  implies  that  the  sum  in  (A2.9)  is  zero  so  that 


2 

9SSR„^  2   2       ^i  (A2.10) 

^^  ^  '-"^  "-^'i    "     Cl-(l-..)h.)^ 


because  of  (A2.3), 


A2.3 


Putting  (A2.4)  and  (A2.10)  tcge-her  gives 


^  [SSR       (x'l.'X)"-] 

-  1 


2  rp  -1     r^  rp  -1 

r.  „         ,  (X^X)     x:x.(X'X) 

(XVX)       -  SSR,,.   i ^ .  (A2.11) 


(l-(l-w.)h-)^  ^^-       (l-(l-w.-)h.)^ 

11  0.  1 


When  w.   =  1  this   is  equivalent  to 


r^   (X^X)       -   (n-p)s'^   (x\')     xTx.CX^X)       .  (A2.12) 

1  ^  11* 


A3.1 


Appendix  3.      TnEOPE'IS  0:i  TrZ  HAT  M^7?IX 

In  this  appendix  we  forrally  shov;  that  v;hen  ^-^=1  (we  cari  take  i=l 
withcu"  less  of  generality/ ),   rhare  exists  a  ncnsing^lar  trar.sforration  T, 
such  th-at  ci,    =   (7  "3)^    =  yi    and  a^,...,a^  do  nor  depend  en  y,  .     Thd.s  iTiplies 
that,   in  ~he  transforrried  cocrdirate  sysrem,   zr.e  parar.erer  a^    has  been  dedicated 
to  obser^/aticn  1. 

>Jhen  h- =1  we  have  fcr  The  ccordirate  vector  e-j_  =   (1,3....,0) 


since  h,  ^   =   0,   ^^1.      Let  ?  be  any  pxp  nonsing'ala::^  matrix  whose  first  corjnn 
is   (X"X)"^X*e   .     Th.en 


1       a 

0       A 


where  a  is  l>:(p-l)   and   j  is   (p-l)xl.     -lew  let 


X     -a 
0       I 


with  I  denoting  zhe  (p-l)x(p-l)  identit-y  matrix.  Th.e  trans forrra-icn  we  seek 
is  given  by  T  =  FQ,  which  is  nonsirig'ular  because  both  F  and  0  have  inverses. 
Clearlv 


XT  = 


1   0 
0   A 


.3.2 


and  the  least-squares  esti-nate  of  the  parariet-er  a  =  T  3  will  have  the  first 
residual,  y-, -a,  ,  equal  to  zero  since  a.^,...,a     canrot  affect  this  residual. 

Thiis  also  ir.olies  tra.t  a^,...,a  will  not  depend  on  •/-,  . 
2'   '  p  ^      '1 

To  prove  the  second  theorem  in  Section  2 . 3 

det(X,. Jx, .,)  =  (1-h.)  det  (X'X) 

we  need  first  to  show  that 

T       T 
det  (I-uv  )  =  l-v"u 

where  u  and  v  are  coluirn  vectors.  Let  Q  be  an  orthonormal  matrix  such  that 

Qu  =   i|u||e^  (A3.1) 

where  e^   is  the  first  standard  basis  vector.  Thien 
det(I-uv^)  =  det  QCl-uv"]  Q^ 
=  det  [I-l |ul |e^v"Q-]  =  1  -  v^Q^e^  | |u| | 

T 

which  IS  just  l-v*u  because  of  (A3.1).  Now 

det  •\i)^X(^)  =  det  [(I-x^x^(X'X)"~)  X'X] 

T  T   -1  T   ~-  T 

and  letting  u  =  x,-  and  v  =  x^CX  X)   ;cr.pletes  the  proof  since  x^(X*X)  x:=h,. . 
(We  are  indebted  to  I^vid  Gay  fcr  sirrplifying  our  original  proof.) 


A^.l 
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Exhibit  .^o.  Title 

1  Assigrjrents  of  rcw  Indices  to  Countries 

2  Data 

3  Ordinary  least  Tcuares  "egression  results 

4-  Nonral  Probabiliry  Plot  of  Stadentized  Residuals 

5  Sfudentized  Residuals 

6  Diagonal  Elerp.ents  of  the  Hat  Matrix. 

7  ITBFBETAS:  Square  Roots  of  the  Sum  of  Squares  of  the 
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8-11  DFBETAS  (for  individual  coefficients) 
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23  NDFBETAS  with  Observation  U9  Removed 
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